Name | Version | Summary | date |
vllm |
0.8.0 |
A high-throughput and memory-efficient inference and serving engine for LLMs |
2025-03-18 22:22:33 |
vllm-xft |
0.5.5.1 |
A high-throughput and memory-efficient inference and serving engine for LLMs |
2025-02-20 06:55:00 |
vllm-npu |
0.4.2 |
A high-throughput and memory-efficient inference and serving engine for LLMs |
2025-01-16 04:12:48 |
vllm-rocm |
0.6.3 |
A high-throughput and memory-efficient inference and serving engine for LLMs with AMD GPU support |
2024-10-15 17:17:24 |
vllm-flash-attn |
2.6.2 |
Forward-only flash-attn |
2024-09-05 20:36:33 |
vllm-acc |
0.4.21716571491.2888474 |
A high-throughput and memory-efficient inference and serving engine for LLMs |
2024-05-24 17:35:42 |
vllm-online |
0.4.2 |
A high-throughput and memory-efficient inference and serving engine for LLMs |
2024-04-29 02:49:29 |
tilearn-infer |
0.3.3 |
A high-throughput and memory-efficient inference and serving engine for LLMs |
2024-04-22 03:24:24 |
hive-vllm |
0.0.1 |
a |
2024-02-28 19:44:57 |
vllm-consul |
0.2.1 |
A high-throughput and memory-efficient inference and serving engine for LLMs |
2023-10-26 07:04:42 |
vllm-py |
0.0.1 |
A high-throughput and memory-efficient inference and serving engine for LLMs |
2023-06-19 03:47:08 |
woosuk-vllm-test |
0.1.1 |
vLLM: Easy, Fast, and Cheap LLM Serving for Everyone |
2023-06-18 20:09:46 |